NVLink transforms GPU-accelerated computing

NVLink serves as a high-speed link that propels data and computations to practical results for GPU and CPU processors in accelerated systems. Accelerated computing, which was once restricted to high-performance computers at government research institutions, is now generally accessible.

Banks, automakers, industries, hospitals, merchants, and others are using AI supercomputers to manage the growing volumes of data they need to filter and understand.

These powerful, efficient systems are superhighways in computing. They use parallel pathways to carry calculations and data on their blazingly quick journey to practical solutions.

CPU and GPU processors make up the route's resources, while fast connections serve as its onramps. The industry standard for interconnects for accelerated computing is called NVLink.

NVLink: What is it?

The high-speed link between GPUs and CPUs, known as NVLink, is created by a dependable software protocol and often operates on several wire pairs printed on a computer board. It makes it possible for processors to send and receive data from shared memory pools incredibly quickly.

Now in its fourth version, NVLink connects host and accelerated processors at up to 900 gigabytes per second (GB/s).

The link in conventional x86 servers, PCIe Gen 5, has more than seven times that bandwidth. Additionally, NVLink offers five times the energy efficiency of PCIe Gen 5 due to its data transfer efficiency of only 1.3 picojoules per bit.

NVLink's past

Since its original debut as a GPU connection with the NVIDIA P100 GPU, NVLink has improved in tandem with each subsequent NVIDIA GPU design.

NVLink garnered a lot of interest in the high performance computing community in 2018 when it was first used to link the GPUs and CPUs in Summit and Sierra, two of the most powerful supercomputers in the world.

The systems, which are installed at Oak Ridge and Lawrence Livermore National Laboratories, are boosting science in a variety of fields, including disaster predicting and medication discovery.

Bandwidth doubles, then increases once more

The maximum capacity per GPU was raised to 600GB/s in 2020 with the third-generation NVLink, which included a dozen interconnects in each NVIDIA A100 Tensor Core GPU.

The A100 powers AI supercomputers in company data centers, cloud computing services, and HPC labs across the globe.

There are now eighteen fourth-generation NVLink interconnects in a single NVIDIA H100 Tensor Core GPU. Furthermore, the technology has taken on a new, strategic role that will enable the most advanced CPUs and accelerators in the world.

A Link Between Chips

A superchip is created by combining two processors onto a single device via a board-level link known as NVIDIA NVLink-C2C. For example, two CPU chips are combined to create 144 Arm Neoverse V2 cores in the NVIDIA Grace CPU Superchip, a processor made to offer energy-efficient performance for cloud, corporate, and HPC clients.

The Grace Hopper Superchip also consists of a Grace CPU and a Hopper GPU connected by NVIDIA NVLink-C2C. On a single chip, it integrates accelerated computation for the most taxing AI and HPC operations.

Alps, an AI supercomputer planned for the Swiss National Computing Center, will be among the first to use Grace Hopper. When the high-performance supercomputer goes online later this year, it will take on enormous research challenges in fields like astrophysics and quantum chemistry.

Furthermore, Grace and Grace Hopper are quite good at using less energy when performing hard cloud computing activities.

For example, recommender systems are ideal for the Grace Hopper processor. These internet economic engines need efficient access to vast volumes of data in order to deliver trillions of results daily to billions of consumers.

NVLink is used in this powerful system-on-chip for automakers that integrates NVIDIA Hopper, Grace, and Ada Lovelace computers. A car computer called NVIDIA DRIVE Thor combines cognitive functions including entertainment, parking, automated driving, digital instrument cluster, and more into a single design.

LEGO Computer Links

The socket that is engraved into a LEGO component is comparable to how NVLink works. It provides the framework for building supersystems that can tackle the most difficult AI and HPC problems.

For example, the NVLinks of eight GPUs in an NVIDIA DGX system use NVSwitch chips to establish fast, direct communications. Together, they enable an NVLink network where the GPUs on each server operate as a single unit.

To achieve even greater performance, DGX workstations can be stacked into modular units of 32 servers, creating a powerful and efficient computing cluster.

Users can incorporate a modular block of 32 DGX devices into a single AI supercomputer by using an NVIDIA Quantum-2 switched InfiniBand fabric between them and an NVLink network inside the DGX. For example, an NVIDIA DGX H100 SuperPOD has 256 H100 GPUs to deliver the best AI performance up to an exaflop.

For even better performance, users can utilize cloud-based AI supercomputers, such as the one Microsoft Azure is building with tens of thousands of A100 and H100 GPUs. Companies like OpenAI use this service to train some of the largest generative AI models in the world.

It also provides yet another example of what fast computing can accomplish.